Code
library(tidyverse)
df <- read_csv(here::here("temp/hemnet_properties_for_readme.csv"))A description of Swedish property data
library(tidyverse)
df <- read_csv(here::here("temp/hemnet_properties_for_readme.csv"))Hello everyone - I hope that you are having fun in Gothenburg! I am sorry that I could not be there to join you - but I am sure that I am having a blast in Oslo at this time.
Kerstin spoke to me briefly about using at modern register data to look at the effect of remote work on location choices in Sweden. For example, we could look at how many people now live further away from the city center, or from where they work, as a result of remote work becoming more prevalent.
It is an elegantly simple idea, and if the data is available, I think would make a neat paper.
There are a number of additions to this - I feel like looking at the JobTech data on the number of roles advertising remote work in the job description would be beneficial.
In addition, we could look at property sales in different areas, and see if there is a correlation between the number of remote work roles and property prices.
Nick Bloom and some colleagues talk about the prevalence of work from home in the US and note that it is still high, even after the pandemic.
They show that it is highest in areas of high population density:

They also find that it is higher for individuals
For my own interest, I have a dataset of property sales in Malmö that could be expanded to the whole country for this purpose.1
Currently I scrape the records of property sales from Hemnet for Malmö each week and store them in a database on GitHub.
The dataset contains information on the final sale price, property type, location, and other features of the property.

df %>%
filter(type == "Lägenhet") %>%
ggplot(aes(x = sale_date, y = price_per_square_meter)) +
geom_point(alpha = .1) +
geom_smooth(group = 1) +
scale_y_continuous(labels = scales::dollar_format(prefix = "SEK ")) +
labs(title = "Price per square meter over time in Malmö",
x = "Sale date",
y = "Price per square meter")
Which areas have the highest price per square meter, on average?
library(ggridges)
# make a ridge plot showing price_per_square_meter by location
df %>%
mutate(location = fct_lump(location, 10)) %>%
# filter(location != "Other") %>%
mutate(location = str_remove(location, ", Malmö kommun")) %>%
filter(!is.na(location)) %>%
mutate(location = fct_reorder(location, price_per_square_meter, .fun = median)) %>%
ggplot(aes(x = price_per_square_meter, y = location, fill = location)) +
stat_binline(geom = "density_ridges") +
theme_minimal() +
theme(legend.position = "none") +
scale_x_continuous(labels = scales::dollar_format(prefix = "SEK ")) +
labs(
title = "Price per square meter by location in Malmö",
x = "Price per square meter",
y = "Location"
)
We can show the types of properties in Malmö on a map.
You can see all the information about each property by clicking on the marker.
library(mapview)
library(sf)
library(RColorBrewer)
malmo_bbox = c(left = 12.9, bottom = 55.53, right = 13.1, top = 55.65)
df_filtered <- df %>%
filter(between(latitude, 55.53, 55.65), between(longitude, 12.9, 13.1)) %>%
slice_sample(n = 1000)
df_sf <- st_as_sf(df_filtered, coords = c("longitude", "latitude"), crs = 4326)mapview(df_sf, zcol = "type")We can also show the price per square meter on a map.
mapview(df_sf, zcol = "price_per_square_meter", col.regions=brewer.pal(9, "YlOrRd"))df %>% skimr::skim()| Name | Piped data |
| Number of rows | 45988 |
| Number of columns | 27 |
| _______________________ | |
| Column type frequency: | |
| character | 8 |
| numeric | 18 |
| POSIXct | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| title | 0 | 1.00 | 6 | 54 | 0 | 13428 | 0 |
| type | 538 | 0.99 | 4 | 22 | 0 | 12 | 0 |
| location | 538 | 0.99 | 12 | 60 | 0 | 1256 | 0 |
| agent_name | 15123 | 0.67 | 4 | 27 | 0 | 426 | 0 |
| agent_link | 15123 | 0.67 | 61 | 107 | 0 | 449 | 0 |
| price_development | 0 | 1.00 | 11 | 24 | 0 | 4066 | 0 |
| ownership_form | 0 | 1.00 | 5 | 23 | 0 | 7 | 0 |
| housing_association | 30647 | 0.33 | 4 | 56 | 0 | 2504 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| final_price | 0 | 1.00 | 2053857.07 | 1200883.74 | 1.00 | 1200000.00 | 1750000.00 | 2600000.00 | 1.7700e+07 | ▇▁▁▁▁ |
| price_per_square_meter | 2229 | 0.95 | 30343.74 | 12231.99 | 0.00 | 20769.00 | 29091.00 | 38500.00 | 9.5652e+04 | ▃▇▃▁▁ |
| starting_price | 66 | 1.00 | 1991311.80 | 1176311.87 | 13455.00 | 1152500.00 | 1695000.00 | 2495000.00 | 2.2000e+07 | ▇▁▁▁▁ |
| number_of_rooms | 52 | 1.00 | 2.48 | 0.96 | 1.00 | 2.00 | 2.00 | 3.00 | 8.0000e+00 | ▇▅▂▁▁ |
| living_area | 0 | 1.00 | 67.28 | 23.10 | 19.00 | 51.00 | 64.00 | 81.50 | 2.2600e+02 | ▇▇▁▁▁ |
| balcony | 27134 | 0.41 | 0.75 | 0.43 | 0.00 | 1.00 | 1.00 | 1.00 | 1.0000e+00 | ▂▁▁▁▇ |
| year_of_construction | 3132 | 0.93 | 1940.92 | 227.45 | 0.00 | 1938.00 | 1955.00 | 1973.00 | 1.9556e+04 | ▇▁▁▁▁ |
| fee | 2220 | 0.95 | 3858.72 | 1232.81 | 0.00 | 2990.00 | 3767.00 | 4598.00 | 1.3086e+04 | ▂▇▂▁▁ |
| operational_cost | 21932 | 0.52 | 7008.66 | 7768.70 | 12.00 | 3600.00 | 4800.00 | 6600.00 | 8.0500e+04 | ▇▁▁▁▁ |
| leasehold_fee | 45808 | 0.00 | 7681.44 | 3008.51 | 1885.00 | 5329.50 | 6992.50 | 10380.50 | 1.8460e+04 | ▂▇▅▂▁ |
| sale_year | 538 | 0.99 | 2019.22 | 2.70 | 2013.00 | 2017.00 | 2020.00 | 2021.00 | 2.0240e+03 | ▃▆▆▇▇ |
| sale_month | 538 | 0.99 | 6.35 | 3.33 | 1.00 | 3.00 | 6.00 | 9.00 | 1.2000e+01 | ▇▆▅▆▇ |
| sale_day | 538 | 0.99 | 16.00 | 8.63 | 1.00 | 9.00 | 16.00 | 23.00 | 3.1000e+01 | ▇▇▇▇▇ |
| floor_number | 27670 | 0.40 | 3.20 | 2.11 | 0.00 | 2.00 | 3.00 | 4.00 | 1.9000e+01 | ▇▃▁▁▁ |
| top_floor_number | 28478 | 0.38 | 5.36 | 2.60 | 1.00 | 4.00 | 5.00 | 6.00 | 2.0000e+01 | ▇▇▁▁▁ |
| elevator_presence | 28293 | 0.38 | 0.59 | 0.49 | 0.00 | 0.00 | 1.00 | 1.00 | 1.0000e+00 | ▆▁▁▁▇ |
| latitude | 0 | 1.00 | 55.59 | 0.04 | 55.51 | 55.58 | 55.59 | 55.60 | 5.9290e+01 | ▇▁▁▁▁ |
| longitude | 0 | 1.00 | 13.00 | 0.06 | 12.90 | 12.98 | 13.01 | 13.02 | 1.8120e+01 | ▇▁▁▁▁ |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| sale_date | 538 | 0.99 | 2013-01-15 | 2024-04-04 | 2020-01-07 | 3891 |
# Define column names
colnames <- df %>% colnames()
# Define descriptions (replace with actual descriptions)
descriptions <- c(
"Final sale price (not original asking price)",
"Title of advert (property address)",
"Property type (e.g. apartment, house)",
"Suburb or area",
"Date of Sale",
"Name of estate agent",
"Link to estate agent website",
"Price per square meter",
"Asking price",
"Difference between asking price and final price",
"Ownership type (e.g. Äganderätt, Bostadsrätt)",
"Number of rooms",
"Living area in square meters",
"Balcony (1/0/NA)",
"Year of building construction",
"Avgift (monthly fee)",
"Driftskostnad (operating cost)",
"Leasehold fee",
"Föreningen (housing association)",
"Year of sale",
"Month of sale",
"Day of sale",
"Floor number (usually for apartments)",
"Number of floors in building",
"Elevator (1/0/NA)",
"Latitude",
"Longitude"
)
# Create data dictionary
data_dictionary <- tibble::tibble(
Column = colnames,
Description = descriptions
)
# Print data dictionary
library(gt)
data_dictionary %>%
gt::gt() %>%
gt::fmt_number(columns = 1, use_seps = FALSE) %>%
gt::tab_header(
title = md("**Data dictionary**"),
subtitle = "Description of columns in the dataset")| Data dictionary | |
| Description of columns in the dataset | |
| Column | Description |
|---|---|
| final_price | Final sale price (not original asking price) |
| title | Title of advert (property address) |
| type | Property type (e.g. apartment, house) |
| location | Suburb or area |
| sale_date | Date of Sale |
| agent_name | Name of estate agent |
| agent_link | Link to estate agent website |
| price_per_square_meter | Price per square meter |
| starting_price | Asking price |
| price_development | Difference between asking price and final price |
| ownership_form | Ownership type (e.g. Äganderätt, Bostadsrätt) |
| number_of_rooms | Number of rooms |
| living_area | Living area in square meters |
| balcony | Balcony (1/0/NA) |
| year_of_construction | Year of building construction |
| fee | Avgift (monthly fee) |
| operational_cost | Driftskostnad (operating cost) |
| leasehold_fee | Leasehold fee |
| housing_association | Föreningen (housing association) |
| sale_year | Year of sale |
| sale_month | Month of sale |
| sale_day | Day of sale |
| floor_number | Floor number (usually for apartments) |
| top_floor_number | Number of floors in building |
| elevator_presence | Elevator (1/0/NA) |
| latitude | Latitude |
| longitude | Longitude |
df %>%
count(type, sort = T) %>%
gt() %>%
tab_header(
title = md("**Property type**"),
subtitle = "Number of properties by type"
) %>%
cols_label(type = "Property type", n = "Number of properties")| Property type | |
| Number of properties by type | |
| Property type | Number of properties |
|---|---|
| Lägenhet | 42723 |
| Villa | 937 |
| NA | 538 |
| Radhus | 470 |
| Parhus | 383 |
| Fritidshus | 372 |
| Par-/kedje-/radhus | 221 |
| Fritidsboende | 182 |
| Kedjehus | 131 |
| Övrig | 22 |
| Tomt | 4 |
| Vinterbonat fritidshus | 4 |
| Gård/skog | 1 |
I am scraping this data because I would like to buy an apartment in Malmö in the future.↩︎